oo 

o 



X 



HUGO GIMBERT AND FLORIAN HORN 



OPTIMAL STRATEGIES IN 
5^ ; PERFECT-INFORMATION STOCHASTIC GAMES 

O ■ WITH TAIL WINNING CONDITIONS 

o 

(N 

> 
O 

(N 

H 
P 

O 



LaBRI, CNRS, Bordeaux, France 
e-mail address: hugo.gimbert@labri.fr 

LIAFA, Universite Paris 7, Paris, France 
e-mail address: fiorian.horn@liafa.jussieu.fr 



Abstract. We prove that optimal strategies exist in perfect-information stochastic games 
with finitely many states and actions and tail winning conditions. 



. . Introduction 
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, finitely many states and actions and tail winning conditions. 
Cf^ ' This proof is different from the algorithmic proof sketched in |Hor08] . 



We prove that optimal strategies exist in perfect-information stochastic games with 



1. Perfect-Information Stochastic Games 

In this section we give formal definitions of perfect-information stochastic games, values 
and optimal strategies. 

1.1. Games, plays and strategies. A (perfect-information stochastic) game is a tuple 
{V,VMi,x,Vmin,VK,E,W,p), where iV,E) is a finite graph, (I4iax, ^min, ^r) is a partition 
of V, W C is a measurable set called the winning condition and for every v & Vji 
and w £ V, p{w\v) > is the transition probability from v to w, with the property 

A play is an infinite sequence vqVi • • • € of vertices such that if w„ G (^Max U Vmin) 
then [vn, "i^n+i) £ E and if Vn € Vn then p{vn-\-i\vn) > 0. A play is won by Max if it belongs 
to W otherwise the play is won by Min. A finite play is a finite prefix of a play. 

A strategy for player Max is a mapping a : V*VMax — > V such that for each finite play 
h = vq . . .Vn such that Vn G ^Max, we have {vn,(T{h)) G E. A play vqVi ■ ■ ■ is consistent 
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with a if for every n, if f„ G VMax then is aivQ ■ ■ ■ A strategy for player Min is 
defined similarly, and is generally denoted r. 

Once the initial vertex v and two strategies a, r for player Max and Min are fixed, we can 
measure the probability that a given set of plays occurs. This probability measure is denoted 
P^I'^. For every n G N, we denote by the random variable defined by Vn[vi^v\ • ■ ■ ) = Vn-, 
the set of plays is equipped with the cr-algebra generated by random variables (V"„)„(=p}. 
Then there exists a probability measure P^I'^ with the following properties: 

P^'^(yo = ^) = l (1.1) 

Pr (K+i = (y{V^ • • • K) i K G Vm..) = 1 , (1.2) 

Pr (K+i = riy^ • • • K) I K e Klin) = 1 , (1.3) 

Pr (K+i I K G ^r) = p(K+i|K) . (1.4) 

Expectation of a real-valued, measurable and bounded function (/> under P^'^ is denoted 
E^'^ For an event C y^, we denote l^y the indicator function of W . We will often 
use implicitely the following formula, which gives the expectation of </) once a finite prefix 
h = vqV\ • • • f n of the play is fixed: 

E-'- [ I • • • K = /i] = E^[^l'-['*1 [ m ] , (1.5) 

where a^^{wi^w\W2 • • • ) = '^(^o ■ ■ ■ VnWxw^ ■ ■ ■ ) and r[/i] and are defined similarly. 

1.2. Values. The goal of player Max is to satisfy the winning condition with the highest 
probability possible, whereas player Min has the opposite goal. Given a starting vertex v 
and a strategy a for player Max, whatever strategy r is chosen by Min, the play will be 
won with probability at least: 

infPj;'^(VF) . 

T 

Thus, starting from player Max can ensure winning the game with probability arbitrarily 
close to: 

vaU(7;) =supinfP^'^(PF) , 

and symmetrically, player Min can ensure the play is not won with probability much higher 
than: 

var(7;) =infsupP^'^(Vr) . 

^ a 

Clearly val*(t;) < val*(?;). According to Martin's theorem |Mar98] these values are equal, 
and this common value is called the value of vertex v and denoted val(f;) 

1.3. Optimal and e-optimal strategies. By definition of the value, for each e > there 
exist e-optimal strategies for player Max and for player Min such that for every vertex 

infP^^'^(VF) > va^-;;) -e , 

T 

and symmetrically for player 2, 

supP;^'""^ (H^) < val(t;) + e . 

cr 

For several classes of winning conditions, it is known that there exists optimal strategies, 
i.e. strategies that are e-optimal for every e. 
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In this paper, we prove that optimal strategies exist in games whose winning condition 
has the following property. 

Definition 1.1. A winning condition W C is a tail winning condition if for every finite 
play p £ V* and infinite play q G V^, 

(qeW) ^ {pq e W) . 
Games with tail winning conditions have the following properties. 
Lemma 1.2. Let G be a game with a tail winning condition W . Then for every vertex 

veV, 

val{v) = max(^_^)g£; val(t(;) if v e VMax , 

< val(t;) = min(„^^)g£; val(i(;) if v £ Vmin , 

val{v) = E{^,«,)G£;P(^l^)val(u') ifv eVR . 

Proof. This comes from ()1.5p . and the fact that = IvF; because is a tail winning 

condition. □ 

2. Optimal strategies in games with tail winning conditions 
Our main result is: 

Theorem 2.1. In every perfect-information stochastic game with tail winning condition 
and finitely many states and actions, both players have optimal strategies. 

The proof of this theorem relies on several intermediary results. 

2.1. Consistent games. Next lemma states that it is enough to prove Theorem 12. II in the 
case where no move of player Max can decrease the value of a vertex and no move of player 
Min can increase the value of a vertex. 

Lemma 2.2. Let G be a game with a tail winning condition W. We say an edge {v,w) 
is superfluous when either v G VMax cLnd valciw) < valciv) or v & Vmin o,nd valciw) > 
valG(v). Let G' the game obtained from G by removing all superfluous edges. If there are 
optimal strategies in G' then there are optimal strategies in G as well. 

Proof. We prove that there exists optimal strategies in the game G' obtained by removing 
only one of the superfluous edges, Lemma 12.21 then results from a trivial induction. 

Let {vs,Ws) be the superfluous edge removed. Without loss of generality, suppose 
Vs G %ax, and let 

m = valcivs) — ^91g{ws) > . 
Suppose there exists optimal strategies o"',r' in G' . 

In game G, player Max has more freedom than in game G' , and from every vertex v 
player Max can guarantee the probability to win to be at least valQi{v), for that player Max 
can use its strategy a' for G' , which is a strategy in G as well. 

We are going to show that this is the best that player Max can expect in G: we are 
going to build a strategy r that prevents the probability to win to be greater than vale • As 
a consequence, a' and r are a couple of optimal strategies in G, which proves the lemma. 

The strategy r is as follows. As long as player Max does not choose the superfluous 
edge {vs,Ws), the play is a play in G' and strategy r consists in playing like the strategy r' 
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in G' . If at some moment player Max chooses the superfluous edge (f<j,Ws) then strategy 
r forgets the prefix of the play and switches definitively to a ^-optimal strategy r™ in G. 
If subsequently player Max chooses the superfiuous edge again, nothing special happens, r 
keeps playing accordingly to r'. Let Superf be the event defined by: 

Superf = {3n G N, (K, K+i) = {vs,Ws)} , 

then the definition of r and m ensures that for any strategy a and vertex v, 

TD TT) 

VI'- [W I Superf) < ys1g{ws) + -= ^s1g{vs) - y • (2.1) 

That way we have an upper bound on the probability to win when the plays does go 
through the superfiuous edge. In case the play does not go through the superfluous edge, 
we prove: 

{W I ^Superf) < valcK^,) . (2.2) 

For this, we use the following transformation of a into a strategy ag in G' . Strategy as 
plays similarly to a as long as strategy a does not plays the superfluous edge {vs-,Ws). If 
after a finite play wq; • • • i^n, with Vn = Vg, strategy a is about to choose the superfiuous 
edge {vsiWs)-, then Ug stops playing similarly to a. Instead, strategy Ug forgets the past 
and switches definitively to the strategy a' optimal in G', in other words for every play p, 
ag{vo ■ ■ ■ Vnp) = cr'{p). We denote Switch^ the event: 

Switcher = {3n £ N,Vn = Vg and (t(Vo, . . . , Vn) = Wg} . 

Then by definition of dg, for every strategy a and vertex v, 

K"^ I - Switch^) = P^'^ {W I ^ Superf) (2.3) 

F^/'- {W I Switch^) > valG'(w.) . (2.4) 

Since ag is a strategy in G' then P^'"^ (W) = Pj"'^' (W) < val{G'){vg) because r' is optimal 
in G'. Since P^^'^ (W) is a convex combination of P^I"'^ (W \ ^ Switch,^) and P^"'^ {W \ Switch^) 
then according to (j2.4p it implies that P^"'^ (W \ Switcher) < valG"(fs). Together with ()2.3p 
it proves (p:2]) . 

We can now prove that the value of Vg in G and G' are the same: 

vale (v g) = vale (vg) ■ (2.5) 

Indeed, for every strategy a, Fy"'^ (W) is a convex combination of P^'"^ (W \ Superf) and 
K"''' (W I ^ Superf) hence according to and ([22]), K"'^ (W) < max{valG Vg-^,valG'ivg)}. 
Taking the supremum over a, since m > it proves p.5p . 

To conclude we prove that ()2.5p holds not only for Vg but for any vertex v. Let t; be a 
vertex, a be a strategy and ag the associated switch strategy. Then, since a and ag coincide 
when event Superf does not occur, 

p;^'^ (W) = P^'^ {W Superf) + P^'^ {W \ Superf) • P^'^ (Superf) 

= P^^'^ {W Superf) + P;^'^ {W \ Superf) • P^-^ (Superf) . (2.6) 

According to (|2.ip . P^'^ {W \ Superf) < valG('Vs) = valG"(ws) according to (|2.5p . By defi- 
nition of r and ag, Fy"''^ (W \ Superf) = valc'ivg) because when the event Superf occurs 
the play is consistent with optimal strategies a' and r' in G' . Finally, P^'^ {W \ Superf) < 
P^'"^(Ty I Superf), which together with gives F^'^ (W) < F^''^ (W). Since ag is a 

strategy in G' and r is optimal in G' , Pj^'^ (W) < valG'(u). Taking the supremum over a, 
we get valciv) < valG''(t;) which achieves the proof. □ 
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We say that a game G is consistent when for every edge {v,w), if v G Vmux U Vmin then 
valciv) = valciw). consistent games have the following properties. 

Lemma 2.3. Let G be a consistent game with a tail winning condition W . Then for every 
initial vertex vq and strategies a, r, and every n £ N, 

[val(K+i) \Vo,...,Vn]= val(K) • 

Proof. Comes from lemma [L2] and the fact that the game is consistent. □ 



2.2. Deviations. To detect bad behaviours of a strategy, we use the notions of quality and 
deviations. 

The quality of a strategy a after a finite play is 

r 

A deviation occurs when the quality of the strategy drops significantly below the value 
of the current vertex. Formally, let 

m = min{val(i;), valfi;) > 0} , 

be the smallest strictly positive valu43 of a vertex in G, the deviation date is denoted devg- 
and defined by: 

n I /i^(Vo, • • • , < val(K) - y I , 

with the convention min0 = oo. 

Next lemma states that when player Max plays e-optimally, with e small enough, devi- 
ations occur with probability strictly less than 1. 

Lemma 2.4. Let G be a consistent game with a tail winning condition W . Let e > and 
a be an e- optimal strategy. For every vertex v and strategy t, 

(dev. < oo) < . (2.7) 

Proof. We start with proving: 

E^o" M (FdevJ • ldcv.<oo] < val(^;o) . (2.8) 

For every n G N let dev."^ = min{n, devo-}. According to lemma [231 val {y^^^^r,)^ = 

val(fo) hence E^q^ val {y^^^i^)^ ' ldev<^<n < val(wo). Taking the limit of the left hand-side 
of this inequality, we obtain (j2.8p . 

Main step in the proof consists in establishing: 

m 

P^^" {W A dev. < oo) < val(t;o) - - 'K'o (dev. < oo) . (2.9) 

For that, we introduce an auxiliary strategy r'. Let e' > 0. Strategy r' plays like strategy 
r as long as there is no deviation i.e. as long as ho-{vo, . . . ,u„) > val(t;„) — y. In case a 
deviation occurs i.e. h(j{vo, . . . ,Vn) < val{vn) — y then strategy r' forgets the past and 
switches definitively to an e'-optimal response to a[vQ, . . . ,Vn], so that 

F^^'J {W I dev. = nandVo---Vn = vo---Vn) < val(t;„) - - + e' • (2.10) 



"'^if Wv £ V, val(ii) = then m = oo however this case has no interest. 
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Then, 



J {W A dev^ < oo) 



[\W ■ ldcv<^<oo] 

I dev^,Vb,...,V;iev,]] 
[Ki; [Iw I dev,, Fo, • • • , Kiev J • Idov a <OoJ 



m 



devo 



+ e' -Id, 



val(Vdcv<,) • ldcv,<ooj + ( - y + e' 



evo-<oo 



m 



'5; (dev, < oo) 



< val(vo) + 



m 



+ e') • P^^ (dev^ < cx)) 



where the three first equahties are properties of conditional expectations, the first inequahty 
is (|2.10p and the second inequality is (j2.8p . Since this holds for every e', we obtain (j2.9p as 
promised. 

Now we can conclude. Since a is e-optimal, 

val(^;o) - e < (W) = F^;; {W A dev^ < oo) + F^;; (W A dev, = oo) 

< (1^ A dev, < oo) + 1 - P;;;" (dev, < oo) . (2.11) 
Together with (j2.9p we obtain (j2.7p , which achieves the proof of the lemma. □ 



2.3. Construction of an optimal strategy. We can now proceed with the second and 
last step in the proof of Theorem 12.11 From an e-optimal strategy a, with e small enough, 
we construct an optimal strategy, by resetting the memory of a at right moments. A similar 
construction has been used in |Cha06j for proving a zero-one law in concurrent games with 
tail winning conditions. 

Lemma 2.5. Let G be a consistent game with a tail winning condition W . Then player 
Max has an optimal strategy in G. 

Proof. If all vertices in G have value 0, there is nothing to prove. 

Otherwise, let m be the smallest strictly positive value of a vertex and a be an ^- 
optimal strategy. Using a, we are going to define a strategy a' and prove that a' is optimal 
in G. For that, we define t(vo, . . . ,u„) the date of the latest deviation before date n by 
t{vo) = and 

,/ N }t{vo,...,Vn) if /i^(vj(^o,,.,,^„), . . . ,v„+i) > val(?;„+i) - f , 

t[Vo, . . . ,Vn,Vn+l) = S , , • 

I n + 1 otherwise. 

By definition the sequence (i(Vb, . . . , Vn))n€N is increasing, we denote T its limit in NU{oo}. 
Strategy a' consists in forgetting everything before the last deviation and applying a, i.e. 

a'{vo,. . . ,Vn) =0-(Vi(^(, v„) . 

To prove that a' is optimal, we start with proving for every strategy r and vertex v, 

F^''^ (T < oo) = 1 . (2.12) 

Let D = min{n | tiVo, . . . , Vn) > 1} be the date of the first deviation, then since a and a' 
coincide until the first deviation. 



P^''^ {D <oo)= P^'^ {D < oo) 



(2.13) 
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and by definition of a' for every n G N, 

P^''^ (T = oo I D = n, Fo = ^^0, • • • , K = ^n) = P^;-[^0'-'''-l (T = oo) 
Let e > and r and v sucli that: 

supP"''"' (T = cx)) < Pj;''" (T = cx)) + e . 

t'.v' 

According to lemma [2^ since a is ^-optimal, 

1 + — 

P^-- {D<^)< — ^ < 1 . 

-■^ "I" 2 

By properties of conditional expectations, 

P^'-^ (T = CX)) = E^''^ [P^''^ (T = oo I Z), Fo, . . . , Vd) 



(2.14) 
(2.15) 

(2.16) 



1d<oo-P" '"(r = ^ I D,Vo,...,Vd) 

1d<oo • P^;;^[^--'^-l (T = oo: 
1d<oo • f P"''" (T = oo) + e 



(T = oo) + e 



1 + — 

1 + — 



^^'^(T = oo) + e , 



where the second equality is because P^'^ (D < oo | T = oo) = 1, the third equality is ()2.14p . 
the inequality is ()2.15p . and the last equality is (12.130 and (I2.16p . Since this holds for any 
e, we obtain P^''^ (T = oo) = i.e. (I2J2D . 
Second step of the proof is to establish: 



P^^'^ (val(K) ^ 0\K,{Vo,...,Vn) 







1 . 



(2.17) 



When playing with a', suppose h^'{Vo, . . . ,Vn) converges to then by definition of a', 
haiVf(^Yo,..;Vn)^ . . . , Vn) converges to as well. According to (|2.12p . t{Vo, . . . , Vn) has limit 
T < oo hence haiVr, ■ ■ ■ , Vn) converges to as well. By definition of T, for every n > T, 
t{Vo, . . . , Vn) = T, hence /io-(Vt, • • • , Vn) > val(Vn) — Since /io-(Vr, • • • , Vn) converges to 
0, limsup„ val(V^) < y- But hence val(Vn,) — >nO because by definition of m, (val(w) < 
m) =^ (val(t>) = 0). This proves (|2.17p . 
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We can now achieve the proof of the optimahty of a' . Since W \s a, tail winning 
condition, Levy's law |Dur96j implies, 



,K) 



-^0 



•0 



<P^'Mval(K)^0 



<K^'^ 1 -limsupval(K) 
L " 

< l-limsupE^^'^ [val(K)] 

n 

= 1 - val(uo) , 

where the first inequality holds by definition of hui{vo, . . . ,Vn), the second is (12.17p . the 
third and fourth are basic properties of expectation and the last equality holds according 
to lemma [231 This proves that a' is optimal in G. □ 



2.4. Proof of Theorem 12.11 According to lemma 12.21 we can suppose without loss of 
generality that G is consistent. Since both the winning condition W and its complement 
\ W are tail winning conditions, lemma 12.51 implies that both players have optimal 
strategies in G. 



Conclusion 

We have proved the existence of optimal strategies in any perfect-information game 
with a tail winning condition. We relied heavily on the finiteness of the game, actually the 
result does not hold in general for infinite arenas. Extension of this result to certain classes 
of games with partial information or with infinitely many vertices seems to be an interesting 
research direction. 
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